Bayes-Adaptive POMDPs: A New Perspective on the Explore-Exploit Tradeoff in Partially Observable Domains
نویسندگان
چکیده
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. This difficult decision-making problem can be formulated cleanly by simply extending the state to include the model parameters themselves. However closed-form solutions are not possible. This paper explores a family of approximations for solving this problem. These approaches are able to trade-off between (1) improving knowledge of the POMDP domain through interaction with the environment, (2) resolving uncertainty about the current state, and (3) choosing actions with high expected reward.
منابع مشابه
Learning in POMDPs with Monte Carlo Tree Search
The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation an...
متن کاملExploration in POMDPs
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving known partially-observable Markov decision processes (POMDPs) have been proposed. In this paper we review the similarities and differences between those two domains and propose methods to deal with them simultaneously. This enables us to attack the Bayes-optimal reinforcement learning problem in...
متن کاملBayes - Adaptive POMDPs 1
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) fr...
متن کاملSample-based Search Methods for Bayes-adaptive Planning
A fundamental issue for control is acting in the face of uncertainty about the environment. Amongst other things, this induces a trade-off between exploration and exploitation. A model-based Bayesian agent optimizes its return by maintaining a posterior distribution over possible environments, and considering all possible future paths. This optimization is equivalent to solving a Markov Decisio...
متن کاملPlanning in Stochastic Domains: Problem Characteristics and Approximation
This paper is about planning in stochastic domains by means of partially observable Markov decision processes (POMDPs). POMDPs are di cult to solve. This paper considers problems where one, although does not know the true state of the world, has a pretty good idea about it and uses such problem characteristics to transform POMDPs into approximately equivalent ones that are much easier to solve....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008